TEMPLATE MATCHING WITH NOISY PATCHES: A CONTRAST-INVARIANT GLR TEST 
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Matching patches from a noisy image to atoms in a dictio- 
nary of patches is a key ingredient to many techniques in im- 
age processing and computer vision. By representing with a 
single atom all patches that are identical up to a radiometric 
transformation, dictionary size can be kept small, thereby re- 
taining good computational efficiency. Identification of the 
atom in best match with a given noisy patch then requires a 
contrast-invariant criterion. In the light of detection theory, 
we propose a new criterion that ensures contrast invariance 
and robustness to noise. We discuss its theoretical ground- 
ing and assess its performance under Gaussian, gamma and 
Poisson noises. 

Index Terms — Template matching, Likelihood ratio test, 
Detection theory, Image restoration 

1. INTRODUCTION 

In this paper, we address the problem of template matching of 
patches under various noise conditions. More precisely, when 
provided a collection of noise-free templates (the dictionary), 
we focus on finding for a given noisy patch the best matching 
element in the dictionary. Template matching is at the heart 
of many recent image processing and computer vision tech- 
niques, for instance, for denoising [ 1 1 or classification with a 
labeled dictionary [2]. We focus in the following on how to 
perform template matching when the noise departs from the 
Gaussian distribution. Inspired by our previous work about 
the comparison of noisy patches O, we extend here the pro- 
posed methodology to the problem of template matching. 

By x, we denote a patch of an image, i.e., a collection 
of N noisy pixel values. By a G V, we denote a template 
taken from a dictionary V (a also has TV pixels). We do not 
specify here a shape but consider that the values are ordered 
so that when a patch x is compared to a template a, values 
with identical index are in spatial correspondence. For best 
efficiency, dictionaries should be as small as possible while 
being representative of images. To limit the size of dictionar- 
ies, a common idea is to let atoms represent a class of patches 
that are identical up to a radiometric transformation. Hence, a 
template should essentially encode the geometrical patterns of 



a patch rather than its radiometry. Of course, to exploit such a 
dictionary, the template matching criterion must be invariant 
to the radiometric changes considered while being robust to 
the noise statistic. 

We assume that the noise can be modeled by a (known) 
distribution so that a noisy patch x is a realization of an N- 
dimensional random variable X modeled by a probability 
density or mass function p(-\0). The vector of parameters 
is referred in the following as the noise-free patch. For ex- 
ample, a patch x damaged by additive white Gaussian noise 
with standard deviation a can be modeled by: 







(1) 



where is the noise-free patch and n is the realization of a 
zero-mean normalized Gaussian random vector with indepen- 
dent elements. It is straightforward to see that X\0 follows 
a Gaussian distribution with mean and standard deviation 
a. While such decompositions exist for some specific dis- 
tributions (e.g., gamma distribution involves a multiplicative 
decomposition), in most cases no decomposition of x in terms 
of and an independent noise component may be found (e.g., 
under Poisson noise). In general, when noise departs from ad- 
ditive Gaussian noise, the link between X and is described 
by the probability density or mass function p(x\0). 

2. PROBLEM DEFINITION 

A template matching criterion c defines a mapping from a pair 
formed by a noisy patch and a template (cc, a) to a real value. 
The larger the value of c(cc, a), the more relevant the match 
between x and the template a. We consider that a matching 
criterion c is invariant with respect to the family of transfor- 
mations T p parametrized by vector p, if 

MX, a, p, c(X, T p (a)) = c(X, a) . 

A typical example is to consider invariance up to an affine 
change of contrast: T p (a) — T a ,p (a) = aa + (31, where 
Ik = 1 for all 1 < k < N. In the light of detection the- 
ory, we consider that a noisy patch x and a template a are in 
match (up to a transformation Tp) when x is a realization of a 
random variable X following a distribution p(.\0) for which 



there exists a vector of parameters p such that = T p (a). 
The template matching problem can then be rephrased as the 
following hypothesis test (a parameter test): 

Ho : 3p 6 = T p (a) (null hypothesis), 

Hi:\/p 6 ^ T p (a) (alternative hypothesis). 

For a given template matching criterion c, the probability 
of false alarm (to decide Hi under Ho) and the probability of 
detection (to decide Hi under Hi) are defined as: 



P FA =F(c(X,a) <T\p,H ), 
P D =V(c(X,a)<T\0,Hi). 



(2) 
(3) 



Note that the inequality symbols are reversed compared to 
usual definitions since we consider detection of mismatch 
based on the matching measure c. 

According to Neyman-Pearson theorem, the optimal cri- 
terion, i.e., the criterion which maximizes Pd for any given 
Pfa, is the likelihood ratio (LR) test: 



C(x, a) 



p(x\0 = T p (a),Ho) 

p(x\e,Hi) 



(4) 



The application of the likelihood ratio test requires the knowl- 
edge of p and 6 (the parameters of the transformation and 
the noise-free patch) which, of course, are unavailable. Our 
problem is thus a composite hypothesis problem. A criterion 
maximizing Pd for all Pfa and all values of the unknown pa- 
rameters is said uniformly most powerful (UMP). Kendall and 
Stuart (1979) showed that no UMP detector exists in general 
for our composite hypothesis problem [4], so that any crite- 
ria can be defeated by another one at a specific Pfa- The 
research of a universal template matching criterion is then fu- 
tile. We address here the question of how different criteria 
behave on patches extracted from natural images. 

3. CONTRAST-INVARIANT TEMPLATE MATCHING 

In this section we consider radiometric changes T a ,/3 defined 
by two parameters: a and f3. We present different candidate 
criteria for contrast-invariant template matching and discuss 
their robustness to the noise statistics. 

Normalized correlation: The most usual way to mea- 
sure similarity up to an affine change of contrast of the form 
T a ,p(x) = olx + /3t between two (non-constant) vectors x 
and a is to consider their normalized correlation: 



C(x, a) 



N 



(5) 



where x = jj^ k x k and a = jf^ k a k . Indeed, it is 
straightforward to show that the correlation provides the de- 
sired contrast invariance property. Regarding noise corrup- 
tions, it is not straightforward whether the correlation is a ro- 
bust template matching criterion. We will show that, under 



the assumption of Gaussian noise, for a fixed observation x, 
the vector a G V that maximizes the correlation also maxi- 
mizes the likelihood up to an affine change of contrast. 

Generalized Likelihood Ratio: Motivated by optimality 
guarantees of the LR test © and our previous work in Q, 
a template matching criterion can be defined from statistical 
detectors designed for composite hypothesis problems. The 
generalized LR (GLR) replaces the unknowns a, f3 and 6 in 
eq. © by their maximum likelihood estimates (MLE) under 
each hypothesis: 



Q(x,a) 



sup a> /3P(a?|fl Tg^(a),Ho) 

sup t p(x\0 = t,Hi) 
p(x\0 = T aJ (a)) 
p(x\0 = t) 



(6) 



where a, ft and t are the MLE of the unknown a, /? and 0. By 
construction, the GLR satisfies the contrast invariance prop- 
erty. Asymptotically to the SNR, GLR is optimal due to the 
efficiency of MLE. Its asymptotic distribution is known and 
so are the Pfa values associated to any given threshold r: 
GLR is asymptotically a constant false alarm rate (CFAR) de- 
tector. The GLR test is also invariant upon changes of vari- 
able : it does not depend on the representation of the noisy 
patch. While we noted that there are no UMP detectors for our 
composite hypothesis problem, GLR is asymptotically UMP 
among invariant tests 1 6 ] . Due to its dependency on MLE, the 
performance of GLR may fall in low SNR conditions, where 
the MLE is known to behave poorly. 

Stabilization: A classical approach to extend the appli- 
cability of a matching criterion to non-Gaussian noises is to 
apply a transformation to the noisy patches. The transforma- 
tion is chosen so that the transformed patches follow a (close 
to) Gaussian distribution with constant variance (hence their 
name: variance- stabilization transforms). This leads for in- 
stance to the homomorphic approach which maps multiplica- 
tive noise to additive noise with stationary variance. This is 
also the principle of Anscombe transform and its variants used 
for Poisson noise. Given an application s which stabilizes the 
variance for a specific noise distribution, stabilization-based 
criteria can be obtained using Q or © on the output of s: 



Sc(x,a) = C(s(x),s(a)) . 
Sg{x,a) = g(s(x),s(a)) 



(7) 
(8) 



where the likelihood function p(s(x)\s(0)) is assumed to be 
a Gaussian distribution centered on s(0) with a covariance 
matrix a 2 1. As we will see, an advantage of this approach 
compared to the GLR criterion is that it is usually simpler to 
evaluate in closed-form, and then, leads to faster algorithms. 
An important limitation of this approach lies nevertheless in 
the existence of a stabilization function s. Beyond existence, 
the performance of this approach may fall if the transformed 
data distribution is far from the Gaussian distribution. 



4. GLR IN DIFFERENT NOISE CONDITIONS 

In this section, we provide closed-form expressions or iter- 
ative schemes to evaluate the GLR in the case of Gaussian 
noise, gamma noise and Poisson noise. 

Proposition 1 (Gaussian noise). Consider that X follows a 
Gaussian distribution such that 



p(xk\0k) 



1 / (xk ~ Ok) 2 
2^ eX H 



and consider the class of affine transformations 7~ a ,p(x) — 
ax + f3\. In this case, we have 

-\ogG(x,a) = (l-Cfoa) 2 ) 11 *"^ 1 . 

Proof For the Gaussian law, the MLE of 6 is given by i = x 
so that 



-logQ(x,a) 



\\x-aa-j31\\l 
2o~ 2 



and a and $ are the coefficients of the linear least squared 
regression, i.e., 



Ef=iOfc ~x){a k -a) j a _ 
a = K N and p = x — aa , 

EwK-«) 2 

with x and a the empirical mean of x and a. Injecting the 
expression of a and $ in the previous equation gives the pro- 
posed formula. □ 

Remark that for a fixed observation x and any ai, a 2 G 
£>, ifC(sc,ai) < C(x, 02) then ^(cc, ai) < Q{x,a<2). In 
particular, we have 

argmax Q{x,a) = argmax C(as, a) 

= argmax supp(x|0 = T a ,p(Q>),Ho) 

aeV a, (3 

which is the MLE under the hypothesis Ho. However, beyond 
equivalence of their maxima, the GLR is not equivalent to the 
correlation even in the case of Gaussian noise. They have 
different detection performance when the purpose is to take a 
decision by thresholding their answer. Compared to the corre- 
lation, GLR adapts its answer with respect to W X ~^W which, 
in some sense, measures the signal-to-noise-ratio (SNR) in x. 
For a fixed threshold r and a, if the SNR of x is small enough, 
GLR will put the pair (sc, a) in correspondence whatever their 
content. In fact, when the SNR is small enough, any template 
up to a radiometric transform can explain the observed real- 
ization. The correlation, which does not take into account 
the noise in its definition, does not adapt to the SNR of x. 
Worse, the correlation tends to increase when the SNR of x 
decreases. We will see in Section [4] that such a behavior of 
GLR is of main importance for a template matching task. 



Proposition 2 (Gamma noise). Consider that X follows a 
gamma distribution such that 



p(xk\0k) 



Lx k 
Ok 



and consider the class of log-affine transformations 
Ta,p(x) — j3x a where (.) a is the element-wise power func- 
tion. In this case, we have 



N 



-log£K*,a)=L^log(^ 

k=l V Xk y 



where a and j3 can be obtained iteratively as 



Y,k( l - r k^ga k * 1 x k 
it \n — and Pi+i = — > — 



with rk,i = Xk/iPiCL^)' whatever the initialization. 

Proof. For the gamma law, the MLE of 6 is given by i = x 
so that 

-teg^ > a)=LX;flogM + |^-i > ) . 

The function f3 \-> ^ k — log p(x k \/3 a k ) has a unique 
minimum at J2k Moreover, the function a \-> 

^2 k — log p(x k \O k = /3a k ) is convex and twice differen- 
tiable, therefore the Newton method can be used to esti- 
mate a whatever the initialization. Differentiating twice 
a ^ — logp(x k \6 k = Pa%) gives the proposed itera- 
tive scheme. Injecting the value of a and (3 in the previous 
equation gives the proposed formula. □ 

Unlike in the case of the Gaussian law, there is no closed- 
form formula of GLR in the case of the gamma law and one 
should rather compute it iteratively. Note that in practice only 
a few iterations are required if one initializes using the log- 
moment estimation, as suggested in Q, leading to the fol- 
lowing initialization: 



a 



/ max(]P fc (logx fc - logx) 2 - ^(1,L),0) 
^2 k (loga k -loga) 2 



Po = exp (logx — ip(L) + log(L) — aloga) 

where log x = ± ^k lo S x k and lo g a = j? Y,k lo & a k- 

Proposition 3 (Poisson noise). Consider that X follows a 
Poisson distribution so that 



p(xk\0k) 



X k \ 



and consider the class of log-affine transformations 
T a ,p(x) = j3x a . In this case, we have 



f/m km j mam 




M9^iiiyiM 

it jw.tm ~* ■ mM 




(a) 
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Fig. 1. (a) Patch dictionary, (b) ROC curve obtained under Gaussian noise, (c) ROC curve obtained under gamma noise and (d) 
ROC curve obtained under Poisson noise. In all experiments, the SNR over the whole dictionary is about — 3dB. 



-\ogQ{x,a) 



N 

£ 

k=l 



x k log 




where a and f3 can be obtained iteratively as 



a 7 -+i=a 7 - 
whatever the initialization. 



and fii+i 



= E fe %k 

k a k 



Proof. For the Poisson law, the MLE of is given by t = x 
such that 



log g(x, a) = J2 [xh log (f^) + H 



Xk 



Y,k~^P( x k\P a k) has a unique 
Moreover, the function a \-> 



The function f3 \-> 
minimum at S fc x * . 

J2 k — log p(xk\0k = fia k ) is convex and twice differen- 
tiable, such that the Newton method can be used to es- 
timate a whatever the initialization. Differentiating twice 
a \-> ^ k — \ogp(xk\0k — fia k ) gi yes me proposed itera- 
tive scheme. Injecting the value of a and f) in the previous 
equation gives the proposed formula. □ 

Again there is no closed-form formula of GLR, but in 
practice only a few iterations are required if one uses the 
a and f3 that minimize the linear least square error between 
log x and log a. 



on patches extracted from the classical 512 x 512 Barbara 
image. Each noisy patch a? is a noisy realization of the noise- 
free patches under Gaussian, gamma or Poisson noise with an 
overall SNR of about — 3dB. Each template a is a randomly 
transformed atom of the dictionary up to an affine change of 
contrast for the experiments involving Gaussian noise, and up 
to a log-affine change of contrast under gamma or Poisson 
noises. All criteria are evaluated for all pairs (as, a). The pro- 
cess is repeated 20 times with independent noise realizations 
and radiometric transformations. 

The performance of the matching criteria is given in term 
of their receiver operating characteristic (ROC) curve, i.e., 
the curve of Pd with respect to Pfa, where we have relaxed 
the hypothesis test as 



Ho : 3a, /3 



« T a ,p(a) 



(null hypothesis), 
(alternative hypothesis) 



and where 6 « T a ,/3 (a) reads as: on average, the noise-free 
patch T a ,p(a) explains almost as well the realizations of X 
than the actual noise-free patch 0, and is measured by: 

V KL (0 II T a ,p (a)) <i/, 

where Vkl is the Kullback-Leibler divergence and v is a 
small value (chosen here equal to 0.02). Results are given 
in Figure \T\ Even with Gaussian noise or with variance sta- 
bilization, the correlation behaves poorly in noisy condition. 
The generalized likelihood ratio (GLR) is the most powerful 
criterion followed by the GLR with variance stabilization. 



5. EVALUATION OF PERFORMANCE 
5.1. Detection performance 

We evaluate the relative performance of the correlation, GLR 
and the variance stabilization based matching criteria on a dic- 
tionary composed of 196 noise-free patches of size N = 8 x 8. 
The noise-free patches have been obtained using the k-means 



5.2. Application to dictionary-based denoising 

We exemplify here the performance of GLR in a dictionary- 
based denoising task. The dictionary V is considered describ- 
ing a generative model of the patches x of the noisy image 
as realizations of X following a distribution of parameter 
= Toe, 13(a) with a G V. Under this model, we suggest 
estimating each patch of the image as: 



(a) 



(b) 



(c) 



(d) 



Fig. 2. (a) Noisy input image damaged by gamma noise (PSNR=21.14). (b) Denoised image using the GLR after variance 
stabilization (PSNR=27.42). (c) Denoised image using the GLR adapted to gamma noise (PSNR=27.53). (d) Image composed 
of the atoms of the dictionary. 



0(x) = ^ ^2 Gfaa)a* with Z=^£(x,< 



(9) 



where a* = T a ^(a) and a and j3 are the MLE of a and 
P used in the calculation of Q(x,a). Equation © has a 
Bayesian interpretation as the posterior mean estimator: 



0(x) = 



x)a* 



(10) 



T,aevP( a *\ x ) 
considering a priori that the frequencies of the atoms of V are 
uniform in the image. The posterior mean is known to mini- 
mize the Bayesian least square error E ||0(-X") — 0||| | . 

Figure [2] shows the denoising results obtained on a 128 x 
128 image damaged by gamma noise (with L = 10) using 
© with the GLR adapted to gamma noise and with the GLR 
adapted to a Gaussian law after variance stabilization^. The 
dictionary V is chosen as the set of all atoms extracted from 
a 128 x 128 image (a.k.a., an epitome) built following the 
transparent dead leaves model of [9 |. This model ensures the 
dictionary to be shift invariant lH0][Tli while representing in- 
formation of different scales. As in |[T0][Tl], we manipulate 
epitomes in Fourier domain in order to evaluate eq. (0 effi- 
ciently. Eventually, Fig. [2] shows that using the GLR for the 
gamma law or for the Gaussian law after stabilizing the vari- 
ance are both satisfactory visually and in term of PSNR. 

6. CONCLUSION 

Normalized correlation is widely used as a contrast-invariant 
criterion for template matching. We have shown that the GLR 
test provides a criterion that is more robust to noise. In the 
case of Gaussian noise, this criterion involves both a normal- 
ized correlation term and a term that evaluates the signal-to- 
noise ratio of the noisy data. Under non-Gaussian noise dis- 
tributions, criteria derived from the GLR test are generally not 
known in closed form but require a few iterations to be evalu- 
ated. When variance stabilization technique can be employed, 



^hen using stabilization, a debiasing step is performed following [8 1. 



our numerical experiments show that good performance is 
reached using Gaussian GLR after variance stabilization. 
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